Draft 2, Version of 11 October 2012
Oxford University Mathematical Institute
An Introduction to Pure Mathematics
Notes by Peter M. Neumann (Queen's College)
Preface
These notes are intended as a rough guide to the eight-lecture course Introduction
to Pure Mathematics which is a part of the Oxford 1 st -year undergraduate course for
the Preliminary Examination in Mathematics. Please do not expect a polished account.
They are my personal lecture notes, not a carefully checked textbook. Nevertheless, I
hope they may be of some help.
Our 1 st -year courses are designed to build on your previous experience and the knowl-
edge of mathematics that you have acquired from school or college. The published syn-
opsis for this one is as follows.
• The natural numbers and their ordering. Induction as a method of proof, including
a proof of the binomial theorem with non-negative integral coefficients.
• Sets: examples including the natural numbers, the integers, the rational numbers,
the real numbers. Inclusion, union, intersection, power set, ordered pairs and carte-
sian product of sets.
• Relations. Definition of an equivalence relation. Examples.
• Maps: composition, restriction; injective (one-to-one), surjective (onto) and invert-
ible maps; images and preimages.
• Rules for writing mathematics with examples.
• Hypotheses, conclusions, "if", "only if", "if and only if", "and", "or".
• Formulation of mathematical statements with examples. Quantifiers: "for all",
"there exists" .
• Problem solving in mathematics: experimentation, conjecture, confirmation, fol-
lowed by explaining the solution precisely.
• Proofs and refutations: standard techniques for constructing proofs; counter exam-
ples. Example of proof by contradiction and more on proof by induction.
As your primary reading to support this course we recommend
(■*•) Charles Batty, How do undergraduates do Mathematics? (Mathematical Insti-
tute Study Guide, 1994).
Paper copies can be purchased from Reception in the Mathematical Institute. It is also
available on-line at http://www.maths.ox.ac.uk/files/study-guide/guide.pdf.
i
Further reading:
(1) Geoff Smith, Introductory Mathematics: Algebra and Analysis (Springer- Verlag,
London, 1998), Chapters 1 and 2.
(2) Robert G. Bartle and Donald R. Sherbert, Introduction to Real Analysis
(Wiley, New York, Third Edition, 2000), Chapter 1 and Appendices A and B.
(3) C. Plumpton, E. Shipton and R. L. Perry, Proof (Macmillan, London, 1984).
(4) R. B. J. T. Allenby, Numbers and Proofs (Arnold, London, 1997).
(5) Richard Earl, Bridging Course in Mathematics (Mathematical Institute website:
use the search box in http://www.maths.ox.ac.uk.)
(6) G. Polya, How to solve it (Second edition 1957, reprinted by Penguin Books, 1990).
My personal opinion is that this GREAT Classic should be in every mathematician's
personal library.
Oxford synopses provide a good guide to the content of the lectures, but it should
never be forgotten that it is the syllabus which defines the course. It is the syllabus which
should guide student learning, it is the syllabus which specifies to the examiners what
they should test, it is the syllabus which suggests to tutors what they should teach, it is
the syllabus which provides the basis of the lecture synopses.
A set of two exercise sheets goes with this lecture course. The questions they contain
will be found embedded in these notes along with a number of supplementary exercises.
These notes contain much from the notes of the 2011 version of the course. It is a
pleasure to acknowledge my great debt to Dr Robin Knight who wrote them.
I would much welcome feedback. Desy Kristianti of Oriel College has already made
useful suggestions for which I am very grateful. Please let me know of any further errors,
infelicities and obscurities. Email me at peter.neumann@queens.ox.ac.uk or write a
note to me at Queen's College.
nMN: Queen's: Version of 11 October 2012
ii
CONTENTS
1. Introduction and advice 1
The Greek alphabet 2
2. Numbers and induction
Natural numbers and mathematical induction 3
Factorials, binomial coefficients, Pascal's triangle 4
The Binomial Theorem for non-negative integer exponents 4
3. Sets
Sets, examples 7
Notation: the sets N, Z, Q, R, C; intervals in R 8
Some algebra of sets: subsets, unions, intersections, set difference 9
Finite sets: their size or cardinality 10
The power set of a set 11
Ordered pairs; cartesian products of sets 11
4. Relations and functions
Binary relations 13
Properties a relation may have; Equivalence relations 13
Partitions of a set 14
Functions 15
Injective, surjective, bijective functions 16
Algebra of functions — composition, etc. 17
Associativity of composition; inverses 18
One-sided inverses 19
5. Writing mathematics
Errors to avoid 21
The language of mathematical reasoning 22
The quantifiers 'for all', 'there exists' 23
Handling negation 25
Formulation of mathematical statements 26
6. Proofs and refutations
Direct proofs 29
Proof by contradiction 29
More on Induction: strong induction; well-ordering 31
Refutation 32
7. Problem-solving in mathematics 33
Some techniques and examples 33
Problem-solving: a summary 36
iv
1 Prolegomenon
1.1 Some words of advice
We begin with a word. If you do not know it look it up. The point I want to make with
the word 'prolegomenon' is that in mathematics (as in life) we are forever coming across
words that are new to us. Do not be afraid to look them up. That is what dictionaries
and other reference works, whether on paper or on-line, are for.
Also a word of advice. Mathematics is a complex subject involving theory, problem-
solving, discovery, conjecture, proof or refutation, examples or counterexamples, applica-
tions, and much else beside. One of its characteristics, especially but far from exclusively
on the 'pure' side, is a certainty that can rarely be matched in other subjects. That
certainty comes from precision in the use of words and symbols and from precision in the
use of logical argument. Many words that we use in mathematics are everyday words
such as real, rational, complex, imaginary, set, group, ring, function, relation, to which
we give special meanings; others, such as homomorphism, isomorphism, homeomorphism,
topology, are rarely met outside of mathematics. Although there may be small local vari-
ations, generally, the mathematical community has agreed on the technical meanings of
such words. If you come across a word you have not met before, or a familiar word in an
unfamiliar context, then make sure to look it up.
1.2 Plan for these lectures
The lecture plan is more or less as listed in the synopsis. We'll begin with some basic
mathematics including numbers and mathematical induction. Then we'll move on to some
of the language of mathematics, the language of sets, relations and functions, and some
particularly important examples of each. Next we'll discuss rules for writing mathematics.
The importance of these rules or conventions cannot be over-emphasised. Mathematicians
need to be able to express their thoughts very precisely. They need to be able to write
what they mean and mean what they write. Their precise meaning needs to be understood
by the reader.
Sections 5 and 6 contain discussion of proof as it is commonly understood in math-
ematics. We will discuss some particularly common logical errors and then examine
some of the basic elements of logic used by mathematicians. I will say something about
problem-solving, something about experimenting and formulating conjectures, and about
the importance of examples and counter-examples. We'll end with a discussion of how
to discover and then refine and explain one's own proofs.
Sometimes a concept or technique might be used informally before it has been formally
defined or discussed. This is so that when we come to our formal discussion we will already
have some context to build on. In particular, although I shall not be discussing proofs
until the second week, I will feel free to show some examples right from the first lecture.
1.3 Lectures and lecture notes
Pre-prepared lecture notes are — or should be — primarily for a lecturer's own use. They
are, if you like, a script for the live event which is the lecture. Your lecturers should feel
free to say things that are not in their notes, and not say things that are in their notes.
Therefore do take your own notes in lectures. Doing so has several advantages:
1
(1) You get a record of what actually happened in the lecture, including explanations
that occurred to the lecturer on the spur of the moment; diagrams; questions that
members of the audience asked; the responses; what, if anything, was particularly
emphasised; what struck you personally as noteworthy;
(2) it can help fix the content of the lectures in your memory;
(3) the ability to take accurate notes is a useful life skill.
On this last point: mathematics undergraduates go on to careers of every kind. Twenty
years from now you may remember little or none of the material you studied as an
undergraduate, and you may well have no need for it, but if you get the practice now,
you should be able to take useful minutes of meetings and take accurate notes of any
seminars or presentations you attend later in life.
The taking of accurate notes is not a matter of copying what a lecturer writes on
the board. For one thing there is no point — most of our lecturers provide written notes
such as these — but for another, you will find that most lecturers go too fast for that.
The lecturer leads, the audience follows, and following inevitably involves falling behind.
Rather, try to follow the lecturer's meaning, and make notes in the real sense of that
word. Jottings. But make sure that your notes contain enough to stimulate your memory
so that you can reconstruct the narrative and the arguments for yourself afterwards. For
example, make a note of diagrams. To my great embarrassment I have been unable to
incorporate diagrams in these notes, mainly for lack of time. But a picture is worth a
thousand words, and I am sure to use diagrams in the lectures. Create your own personal
version of the lecture notes. That is a powerful technique for learning and understanding.
One final observation on lectures: do not expect to understand them instantly. If you
do, that's great; but don't worry if you feel lost or left behind. Try to retain at least some
idea of the big picture; work through your lecture notes after the lecture; then discuss
points that are still puzzling you with your contemporaries and your college tutors.
1.4 The Greek alphabet
Here is the part of the lower case Greek alphabet that is commonly used in mathematics,
not in any particular order, with names and approximate Roman equivalents:
a
: alpha, a
P
: beta, b
7 =
gamma.
5
: delta, d
e
or e : epsilon, e
e
or i? : theta
or ip : phi
i;:
psi
t :
iota, i
K
: kappa, k
A
: lambda, 1
: mu, m
v :
nu, n
id
: omega, o
7T
or w : pi, p
P ■
rho, r
a
: sigma, s
T
: tau, t
X ■
chi
£
: xi, x
V
: eta
(:
zeta, z
A few of the upper case (capital letters) are also commonly used in mathematics:
7^T; 5^ A; 0^9; 4> ^
A i— > A; to ^ Q; tt h-> II; a t- > X.
You might see E (capital £), but it is rare. The letter £ and the summation symbol Yl
look alike, but technically they are typographically different; the same goes for the letter
II and the product symbol Y\ ■
2
2 Numbers and induction
2.1 Natural numbers
Numbers are, of course, fundamental for most parts of mathematics and its applications.
Most of us are comfortable with various different kinds of numbers. Yet even at this very
basic level we need to define our terms, or perhaps simply agree usage and conventions.
Definition 2.1. A natural number is a member of the sequence 0, 1, 2, 3, . . . ob-
tained by starting from and adding 1 successively. We write N for the collection
(technically, the set — see below) {0, 1, 2, 3, . . .} of all natural numbers.
Important note. Although nowadays all mathematicians use N as a standard
name for the collection of natural numbers, some authors and lecturers include as a
natural number, some do not. Unfortunately, there is no standard convention. In this
course, will belong to N. Just make sure when reading texts or discussing mathematics
with others that you clarify such notational conventions before you get in deep.
Natural numbers have many familiar and important properties. For example, they
can be added and multiplied — that is, if m, n are natural numbers then so are m + n and
m x n — and they may be compared: m < n if m occurs earlier in the sequence than n.
Furthermore N is well-ordered, that is, any non-empty collection of natural numbers has
a least (or first) member. Although there will be further discussion of some of these
points later (see §6.3), all this will, for the purposes of your 1 st year course (and for
much of the 2 nd year too), be taken for granted. Notice, however, that a purist could
find several points of attack. In the definition I have used the terms 0, 1, 'member',
'sequence' and 'adding successively', and in the discussion I have used such terms as
'added', 'multiplied', 'occurs earlier', 'least'. Do we all agree what these mean? At this
level it would be very surprising if we did not. Nevertheless, there is an exciting area
where mathematics and philosophy come together which deals with such questions. It is
represented in the Oxford mathematics degree in the Foundations courses (Logic and Set
Theory) offered in Parts B and C.
2.2 Mathematical Induction
Theorem 2.2 [Mathematical Induction]. Let P be a property of natural num-
bers, that is, a statement P(x) about natural numbers x that may be true for some natural
numbers x and false for others. Suppose that P(0) is true, and that for all natural num-
bers n, if P{n) is true then P(n + 1) is also true. Then P(n) is true for all natural
numbers n.
I have called this a theorem, but what proof can we give? Intuitively it seems clear: we
are given that P(0) holds, then its truth implies that of P(l) , the truth of P(l) implies
that of P(2) , and so on. Nevertheless, it says something non-trivial. It collects together
all the individual true statements, P(0), P(l), -P(2), . . . , of which there are infinitely
many, into just one true statement, namely 'P(n) is true for all natural numbers n\ It
is so basic that in some versions of the logical foundations of mathematical thinking it is
taken as an axiom.
Mathematical Induction is a technique for proving theorems. It goes with a method for
defining functions called recursion. Two typical definitions by recursion are the following:
3
Definition 2.3 [Powers]. Let a be a number (or a variable that takes numerical
values). Define a := 1 and then define a n+1 := a n x a for n ^
Definition 2.4 [Factorials]. Define n\ for all n ^ by the rule that 0! :=
thereafter (n + 1)! := n\ x (n + 1) for all n ^ 0.
1, and
These are typical of recursion in that it is used to define a function of a natural number
by specifying what value it takes at , and saying also how to get from the value it takes
at n to the value it takes at n + 1 . The second function defined above is the familiar
factorial function, which we commonly define informally by writing n!:=lx2x3x---xn.
Note that the definitions a := 1, 0! := 1 are made for good reason. It makes sense
that a product of no factors should be 1 . After all, if we have a product of a number of
factors, and then add in no more factors, we do not change the product, that is, we have
multiplied it by 1 . For a very similar reason, we take the sum of no numbers to be : if
we start with a sum of some numbers and then add no summands we do not change the
original sum, so what we have added is 0.
One use of the factorial function is to define the following extremely useful function
of two variables:
Read the
symbol :— as
'to be' or as 'is
defined to be'.
It is quite
different from
— , 'is equal to',
which indicates
that two
previously
defined entities
arc the same.
Definition 2.5 [Binomial coefficients]. For natural numbers m, n with m ^ n
define ( n ) := — r—^ -r .
\mj m\{n-m)\
Famously, the binomial coefficients may be organised into an array commonly called
Pascal's Triangle, whose defining property is captured in the following lemma.
Lemma 2.6. [Pascal's Triangle]. Let m, n be natural numbers such that 1 ^ m ^ n.
Then
n
y m — 1
+
m
(n+l\
Exercise 2.1. Write out a proof of Lemma 2.6, using just our definitions of the
factorial function and binomial coefficients, nothing more.
As a good illustration of how Mathematical Induction may be used we give a proof
of a very famous and important theorem:
Theorem 2.7 [The Binomial Theorem (for non- negative integral exponents)].
Let x, y be numbers (or variables that may take numerical values). Then for every
n
natural number n, (x + y) n = (m) xn ~ m y m -
m=0
Proof. Let P(n) be the assertion that the equation above is true for the natural
number n. Certainly P(0) is true since (x + y)° = 1 while the sum on the right of the
equation has just one term, namely x°y° , which also is equal to 1.
n
Now suppose that P(n) is true, so that (x + y) n = ^ (™) x n ~ m y m . Then
m=0
4
E (™)* n ~"V™ + [byP(n)]
m=0 '
n n
m=0 m=0
n / \
= ,-' + y»« + E(£)+( m "_ 1 ))-" +I -'Y
m=l v 7
that is, by Lemma 2.6 (together with the definitions of (J 1 ~q^ arm + l) J '
(* + y) n+1 = E
n+l
'n + 1\ n+l — m„,m
x y
m=0
which is the statement P{n + 1) . By induction, therefore, the equation holds for all
natural numbers n, as the theorem states.
n n
2
Exercise 2.2. Write careful proofs by induction of formulae for ^ to, ^ m
n
E
rn
m=0
m—0 m—0
3
Exercise 2.3. Show that if Sfc(n) := ^ TO fc then Sk(n) may be expressed as a
m=0
n fe+l
polynomial of degree k + 1 in n , whose leading term is
k + 1
Exercise 2.4. The following famous argument purports to show that all natural
numbers are small. 'Certainly is small. If n is small then also n + 1 is small.
Therefore by induction all natural numbers are small.' What is missing?
Note: I have learned from Desi Kristianti of Oriel College that this is known as
Wang's Paradox. It is not what I would call a paradox. When my friend and I were
shown it by one of our schoolteachers we all treated it as a joke. I still think of it as a
joke — though a joke with a serious moral. The moral is: make sure to define your terms
precisely.
n
Exercise 2.5. Show that ^ (™) = 2™, and that
m=0
E (to)- E (toH- 1 -
O^m^n, $C m $J n ,
m even m odd
Exercise 2.6. Show that ^- < {^^j < 4™ if n ^ 2. How far can you refine these
inequalities?
5
6
3 Sets
Set theory, since its introduction about 120 years ago, has become a sophisticated part
of mathematics with deep theorems that can be studied in Parts B and C of the Oxford
degree. The basic concept is simple, however, and the beginnings of the subject provide
a precise and convenient language that has revolutionised the way mathematicians think
about and present their arguments.
3.1 Sets, examples of sets
We begin with a classic description of what the word 'set' should mean:
Definition 3.1. A set is any collection of individuals. We write x G X to mean
that x is a member of a set X . The members of a set are often called its elements. Two
sets are equal if and only if they have the same elements.
Although this definition is similar to that formulated by Georg Cantor in 1895 there
are difficulties with it. For one thing, it defines a technical term 'set' by reference to two
undefined terms 'collection' and 'individual'. At this stage that should not create prob-
lems. There are, however, more serious problems which are discussed in the second/third
year Set Theory course. (Anyone who is interested can look up set theoretic paradoxes,
such as the famous one called Russell's Paradox.)
One particularly important set:
Definition 3.2. The empty set, written 0, is the set with no elements.
Definition 3.3. Curly brackets (braces) are used to show sets. The set whose ele-
ments are ai, a2, a%, . . . , a n is written {a±, 02, 03, • • • , a n } ■ Similarly, the set whose mem-
bers are those of an infinite sequence a\, 02, as, . . . of objects is denoted {a±, 02, 03, . . .} .
Examples 3.4. The sets {0, 1} and {1, 0} have the same elements, so they are equal.
Similarly, {2, 2} and {2} have the same elements, and so are equal.
A common error to avoid: never confuse a with {a} , the set whose only element is a.
For example, if a = 0, then a has no elements, but {a} has one element (namely a), so
they cannot be equal. Or if a = N then a is infinite, but {a} is not.
We also have notation for a set whose members are identified by a property.
Definition 3.5. Let P or P{x) be a property, that is, an assertion involving a
variable x that may be true (or false) of any given individual x. Then {x \ P(x)} , also
written {x : P(x)}, is the set of all objects x having the property P(x). Read it as 'the
set of all x such that P{x) 1 or 'the set of all x such that P holds for x\ If A is a set,
and P(x) is a property then we write {x £ A \ P(x)} or {x G A : P(x)} for the set
consisting of those elements x of A that have the property P .
Examples 3.6. The set of even natural numbers is {n G N | n is even}. We could
write the set of primes as {n \ n is a prime number} , or as {n G N | n is prime} . The set
{1, 2, 3, 4, 6, 12} is equal to {n G N | n is a factor of 12} . We could write = {x \ x / x}
or = {n G N | n 2 < 0} .
Note that the
membership
relation £
should look
different from
the Greek letter
e (epsilon).
Make sure you
make look
different from
<f> , the Greek
letter phi.
7
Some other important sets:
N is the set {0, 1, 2, 3, . . .} of all natural numbers (recall the warning on p. 3 above —
some authors do not include 0);
Z is the set of all integers (positive, negative, or zero) [Z is the first letter of the
German word Zahlen 'numbers'];
Q is the set of all rational numbers [Q for quotient] ;
R is the set of all real numbers;
C is the set of all complex numbers.
All the above are written in the 'blackboard bold' font which was originally a way
of writing bold-face letters on a blackboard, but has since taken on an independent life.
A purist (such as myself) would insist that this notation should be reserved for the sets
equipped with their most important structure (addition, multiplication, etc.). For the
purposes of these lectures, however, we'll use it just for the sets. You'll find that lecturers
use variations on this notation to denote closely related sets. Thus for example R + or
R >0 often denotes the set of positive real numbers; C* often denotes the set of non-zero
complex numbers. Usually the intention should be clear from the context. Where it is
not, ask.
There are many interesting and important examples of sets that consist of real num-
bers. Perhaps the most commonly occurring are the intervals described as follows.
Definition 3.7 [Real intervals]. Let a, b be real numbers with a ^ b. the follow-
ing are known as intervals:
(1) (a, b) := {x G R | a < x < b} [open interval];
(2) [a, b] := {x G R | a ^ x < b} [closed interval];
(3) (a, b] := {x G R | a < x < b} [half open interval];
(4) [a, b) := {x G R | a ^ x < b} [half open interval];
(5) (a, oo) := {x G R | a < x} ;
(6) [a, oo) := {x G R | a ^ x] ;
(7) (-00,6) := {x G R I x < b};
(8) (-00,6] := {x G R I x ^ b};
(9) (-00,00) := R.
Note that if a = b then [a, b] = {a} and (a, b) = (a,b] = [a, b) = 0. Check that you
understand why this follows from the definitions. Note also that we use the symbol 00
in this context without giving it an independent meaning. It is NOT a real number. It
is easy to see (though perhaps tedious to write out because of the many cases) that an
interval S in R has the property that if x, y G S and x ^ z ^ y then also z £ 5. In
fact, the converse holds. Any set S with this property is an interval. But to prove this
one needs the completeness of R, a matter that will be treated in your Analysis course.
Exercise 3.1. Let S :— {x e Q \ x 2 ^ 2}. Show that S has the property that
if x,y € S and x z ^ y then also z E S. Show also that S is not a rational
interval — that is, there do not exist a, b <G Q such that S~{x^Q\a<x<b}
(or 5 = {seQ|fl^i^()},or any one of the other possibilities for an interval) .
8
3.2 Some algebra of sets
We begin with set containment or set inclusion.
Definition 3.8 [Subsets]. The set A is said to be a subset of a set B if every
member of A is also a member of B. The notation is i C B or B 5 A. If A C B and
A ^ B then we call ^4 a proper subset of B .
Examples 3.9. Note that C X for every set X. Also NCZCQCRCC.and
any real interval S is a subset of R.
The containment C X is not simply convention. It follows from the definition.
After all, it is certainly true that every member of is a member of X .
Just as / means 'is not equal to' and ^ means 'is not less than or equal to' so we
often draw a line through other relation symbols to negate them. Thus a ^ A means
that a is not a member of A and A % B means that A is not a subset of B (that is,
there is some object a G A such that a ^ B).
Observation 3.10. Let A, B be sets. Then A = B if and only if both A C B and
BCA.
Proof. Certainly, if A = B then every member of A is a member of B , so A C B ,
and similarly, B C A. Conversely, if A C B and B C A then for every x, x G A if
and only if x £ B , so A, B have the same members and therefore, by definition of set
equality, A = B .
Simple though this observation is, you will often find that when you wish to prove
two sets equal, breaking the problem down into the two complementary containments
helps greatly.
Definition 3.11 [Set union, intersection, difference]. Let A, B be sets. We
define their union (sometimes also called 'join') by
A U B := {x | x £ A or x £ B (or both)}.
We define their intersection (sometimes also called 'meet') by
A n B := {x | both x G A and x G B}.
We define their set difference by
A \ B := {x | x £ A and x <£ B}.
Examples 3.12. If A := {n G N | n is even} and B := {n G N | n is prime} then
Ar\B = {2}.
{0, 1, 2} U {2, 3} = {0, 1, 2, 3} ; {0, 1, 2} n {2, 3} = {2} ; {0, 1, 2} \ {2, 3} = {0, 1} .
Theorem 3.13. Let A, B, C be sets. Then A U (B n C) = {A U B) n (A U C) .
Also A n (B U C) = (A n B) U (A n C) .
This is where
diagrams,
so-called Venn
diagrams, are
very instructive.
9
Proof (of first part). We use Observation 3.10. Suppose first that x G A U (B D C) .
Then either x £ A or x G i? H C . Thus either x G i or x is in both 1? and C . If x € A
then x £ AUB and a; £ iUC so x e (AuB)n(AuC). If x is in both B and C then
x is in both A U B and iUC, and so x G (iUB)fl(AuC). Thus every member of
iU(BnC) lies in (AuB)n(iUC). That is Au(BflC) C (Al) B) tl (AU C) .
Now suppose that x G (AU B)n(AL)C) . Then x is in both AUB and AuC. Thus
either x G A or, if x ^ A, then x G -B and also x G C. Thus x G AU (B C\ C). Hence
(A U B) n (A U C) C A U (5 n C) . Therefore these two sets are equal.
Exercise 3.2. Write a proof of the second part of the theorem. That is, prove
that A n (B U C) = (A n B) U (A n C) .
Theorem 3.14 [De Morgan's Laws]. £e£ A, £? be subsets of a set X. Then
X\ (AUB) = (X\ A) n (X\B) and X \ (A n B) = (X \ A) U (X \ B).
The proof is left as an exercise:
Exercise 3.3. Write a proof of De Morgan's Laws.
Sometimes we have a family of sets {Ai}i & j indexed by a set /. For example, we may
have sets A\, A2, ■ ■ ■ , A n , or we may have sets A±, A2, ■ ■ ■ , A n , . . . , one for each natural
number, or we could have sets A x , one for each x G M. Then union and intersection are
defined by
Ai := {x I x G Ai for at least one i G /}
iei
and, provided that I 7^ ,
P| Ai := {x I x £ Ai for every i G J}.
Note that if / has two members, say I = {1, 2} then the union of the family is simply
Ai U A2 , and the intersection is just A\ n A2 ■
Exercise 3.4. Let A be a set and Bi a family of sets indexed by a non-empty
set /. Show that
An (U B *) = U^ 05 ') and ^u(f|s i ) = f|(^uB < ).
Exercise 3.5. Let X be a set and Ai a family of subsets of X indexed by a
non-empty set /. Show that
x \(u a =ri( x \^) and ^\(n^) =U( x \^)-
Definition 3.15 [Cardinality of a set]. Suppose that A = {ai, 02, . . . , a n } where
ai 7^ aj whenever i 7^ j . Thus A is a set with n elements (where n is a natural number) .
Then we write \A\ = n, and say that n is the cardinality (or size) of |A| .
Perhaps better would be to use recursion to define finiteness of a set and the cardinality
of a finite set: is finite and |0| = 0; then if A is finite with \A\ = n, and B = AU {b} ,
where b ^ A then also B is finite, and \B\ = n+1. It is then a non-trivial fact, but one
which we shall take for granted, that if X is finite and |X| = n + 1, and Y is obtained
from X by removing any member x, no matter which, then Y is finite and \Y\ = n.
You may like to think why I describe this as non-trivial, and how you would set about
justifying the assertion. 10
Exercise 3.6. Let A, B be finite sets. Show that
A U B = (A \ B) U (A n B) U (B \ A).
Deduce that \A\ + \B\ = \A U B\ + \A n B\ .
The sizes, that is cardinalities, of infinite sets will be touched on the Analysis course.
Definition 3.16 [Power set]. We define the power set of a set A by pA := {X \
ICi}. That is, the power set is the set of all subsets of A.
Theorem 3.17. Let A be a finite set with \A\ = n . Then pA is finite and \pA\ = 2 n .
Proof. We use induction. If \A\ =0 then A has no members, that is, ^4 = 0. Since
is the only subset of 0, p0 = {0}. Thus \pA\ = 1 = 2°.
Now suppose that n ^ and that \pX\ = 2 n for any set X of size n. Let A be a
set with \A\ = n + 1. Since n + 1^1, A^% and so there is a member a of A. Let
B := A \ {a} . Then \B\ = n and so, by inductive hypothesis, \pB\ = 2 n . Those subsets
of A that do not have a as a member are subsets of B, so there are 2 n of them. Those
subsets of A that do have o has a member are of the form {a} U C where C ranges over
subsets of B , and so again there are 2 n of them. Since any subset of A does or does not
contain a, we see that \pA\ = 2™ + 2 n = 2 n+1 . Thus, by induction, the theorem is true
for all finite sets A.
Exercise 3.7. Let k be a natural number, and for a set A let pk(A) be the set
of its subsets of size k (that is, pk{A) := {B e pA \ \B\ — k} ). Use induction on n
(or perhaps more precisely on n — k) together with Lemma 2.6 (Pascal's Triangle)
to show that if k < n then |pfc(A)| = (?) .
3.3 Ordered pairs; cartesian products of sets
Definition 3.18. The ordered pair whose first element is a and whose second ele-
ment is b, is written (a, b) . The defining property is that (a, b) = (c, d) if and only if
a = c and b = d.
The point is that, in an ordered pair one member is first, the other second. Contrast
this with the unordered pair {a, b} , where we cannot distinguish first and second elements;
{a, b} and {b, a} have the same elements, and so they are equal.
Warning: there is a problem with notation here. If a, b G M are real numbers and
a < b, then the ordered pair whose first element is a and whose second element is b,
and the open interval between a and b, are both written (a, b) . Usually the context will
indicate what is intended, but if, in your work, there is the possibility of confusion, then
remove the ambiguity using words to clarify. Write something like 'the open interval
(a, b) ', or 'the ordered pair (a, b) '. We could of course change our notation: for example,
we could write the ordered pair as (a, b) or the open interval as ]a,b[ (though it is bad
policy to use brackets the wrong way round since the eye expects to pair brackets as
[...]). No such scheme is accepted by everybody. One reason is that there are more
mathematical concepts than notations to represent them, so ambiguities like this will
unfortunately but inevitably occur.
We can also define ordered triples (a, b, c) , ordered quadruples (a, b, c, d) , etc. in the
same manner. A sequence n long is called an n -tuple (though NOT if n is small).
11
Definition 3.19. The Cartesian product of sets A, B (which may be the same) is
defined by
A x B := {(a, b) \ a G A and b £ B}.
If A = B , we also write A x A as A 2 . More generally, we define A\ x A2 x • • • x A n to
be the set of all ordered n-tuples (a±, 0,2, ■ ■ ■ , a n ) such that ai £ Ai for 1 ^ z ^ n.
The product of n copies of A may be written as A n . Note that A 1 = A and
A = {0} • Note also that the elements of the Cartesian product Ax B are ordered pairs
(a, b) . They are never written a x b.
The sets ixfixC, (A x B) x C and A x (B x C) are not equal. Members of
the first are ordered triples with first member from the set A, second from the set B,
third from the set C , whereas those of the second are ordered pairs whose first member
is an ordered pair, and those of the third are ordered pairs whose second member is an
ordered pair. Nevertheless, they may be identified in a natural way, and few of us would
be so pedantic as to distinguish them except in very special circumstances. Thus to write
A x B x C = (A x B) x C = A x (B x C) is wrong, but usually harmless.
Examples 3.20. The most familiar example of a Cartesian product is the Euclidean
plane which we regard as being equal to the set of ordered pairs of real numbers M 2 or
Rxl. Indeed, it is by rather imaginative and far-fetched analogy with this important
example that mathematicians have come to associate Descartes' name with the general
set-theoretic construction.
li A = {1,2} and B = {3, 4, 5} , then AxB = {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)} ,
while B x A = {(3, 1), (3, 2), (4, 1), (4, 2), (5, 1), (5, 2)} .
Theorem 3.21. Let A, B, C, D be sets. Then (AxB)n(CxD) = (AnC) x (BHD) .
Proof. Let (x,y) £ (AxB)n(CxD) . Then (x,y) G AxB and (x,y) G CxD. Since
(x, y) G A x B , x £ A and y G B . Similarly x G C and y G D . Therefore x £ A(~)C and
y £ BHD. So (x, y) £ (Ax B)C\(C x D) . Thus (AxB)n(CxD) C (AnC) x(BnD).
Now let (x,y) £ (AnC) x (BnD). Then x £ AnC and y £ BnD. Therefore x £ A
and y £ B, so (x,y) £ AxB, but also x £ C and y £ D, so (x,y) £ C x D. Thus
(x, y) £ (A x B) n (C x D) . This shows that (A n C) x (B n D) C (A x B) n (C x D) .
By Observation 3.10 therefore, (A x B) n (C x D) = (A n C) x (B n D) , as required.
Exercise 3.8. Is it the case that for all sets A, B, C, D,
(A x B) U (C x D) = (A U B) x (C U D)l
Exercise 3.9. Prove that if A, B are finite sets then Ax B is finite and \Ax B\ =
\A\x\B\.
12
4 Relations and functions
In ordinary conversation a relationship can hold between one or more objects: 'tar is
black' uses being black as a unary (that is, 1 -place) relationship; 'Elizabeth is the mother
of Charles' uses being mother of as a binary (2-place) relationship; 'Milton Keynes lies
between Oxford and Cambridge' uses being between as a ternary (3-place) relationship.
In these lectures we will focus on binary relationships as they are needed for mathematics.
4.1 Binary relations
In mathematics a binary relation is something like =, ^ or C which asserts a certain
relationship between two objects. We formalise this idea by identifying a relationship
aRb with the set of ordered pairs (a, b) which are connected by the relation.
Definition 4.1. A binary relation between sets A and B is a subset of A x B. A
binary relation on a set A is a subset of A x A. If R is a binary relation, we write
(a, b) G R and aRb interchangeably.
Although unary, ternary, . . ., /c-ary relations do occur in mathematics, they are far
less common than binary relations. It is usual therefore to use 'relation' to mean 'binary
relation', and I shall do so from now on.
Examples 4.2. The successor relation on N is the set {(a, b) <E N 2 | b = a + 1} .
The order relation on the set of real numbers is the set {(a, b) G R 2 | a ^ b} .
For a set X the subset relation on pX is the relation {(A, B) G (pX) 2 \ A C B} .
There are very many different kinds of relations. One of the most important kinds is
the equivalence relation, which asserts that two objects are, in some sense, to be treated
as being the same. To prepare for the notion we need some further terminology.
Definition 4.3. Let R be a relation on a set A. To say that
R is reflexive means that aRa for all a £ A;
R is symmetric means that whenever a,b £ A and aRb also bRa;
R is transitive means that whenever a,b,c £ A and both aRb and bRc also aRc.
Examples 4.4. The relations =, ^, C are reflexive, the relations < are not.
Relations =,7^, 'have the same size' (for sets) are symmetric; relations <, C are not.
Relations =, ^, C are transitive; relations 'is the immediate successor of ' (on N)
are not transitive.
Exercise 4.1. Which of the following relations on N are reflexive, which are
symmetric, which are transitive?
(i) the relation a | b (read as 'a divides 6');
(ii) the relation a J b (docs not divide);
(iii) a, b are related if a, b leave the same remainder after division by 2012;
(iv) a, b are related if hcf (a, b) > 2012.
Definition 4.5. A relation R on a set A is said to be an equivalence relation if
and only if R is reflexive, symmetric and transitive. Symbols ~, ~, ~, =, and others
like them, are often used to denote a relation that is known to be an equivalence relation.
13
Examples 4.6. The relation of equality on any set is always an equivalence relation.
For any set A, if R := A x A then R is an equivalence relation (the 'universal' relation).
The relation R on N such that mRn if and only if m and n have the same number of
prime factors, is an equivalence relation.
The relation of being congruent is an equivalence relation on the set of triangles in E 2 .
The relation of being similar is an equivalence relation on the set of triangles in M 2 .
The relation ^ on IR is not symmetric, so is not an equivalence relation.
The relation R on R such that x Ry if and only if \x — y\ < 1 is not transitive, so is not
an equivalence relation.
It is very often the case in mathematics that a situation can be fruitfully viewed in
more than one way. That is certainly the case with equivalence relations. An equivalence
relation on a set A is a way of saying that two elements of A are 'essentially' the same.
It divides A up into subsets of elements that are in some way the same as each other.
That is, it gives rise to a partition of A.
Definition 4.7. A partition of a set A is a set II of subsets of A with the following
properties:
(1) ^ II (that is, all the sets in II are non-empty);
(2) \J P = A (that is, every member of A lies in one of the members of II);
Pen
(3) if P, Q G II and P / Q then PC\Q = %.
Examples 4.8. If II := {{2n : n G N}, {2n + 1 : n G N}} then II is a partition of N
(into two parts);
If n := {{0}, {1,4,5}, {2,3}} then II is a partition of {0, 1, 2, 3, 4, 5} (into three
parts).
Exercise 4.2. How many partitions are there of a set of size 1? of size 2? of
size 3? of size 4? of size 5?
Observation 4.9. Each partition of a set A is naturally associated with an equiv-
alence relation on A. Indeed, given the partition II we define a ~ b to mean that the
elements a, b of A lie in the same part of II.
Formally this says that a ~ b if there is some member P of II such that a,b £ P.
Since the union of the members of IT is the whole of A, for any a G A there must exist
some Pell with a G P. Then, trivially, a and a both lie in P, so a ~ a, that is, the
relation is reflexive. Also, if a ~ b then a, b G P where P G IT , and then of course also
b, a G P, so b ~ a. Thus the relation is symmetric. Lastly, if a ~ b and b ~ c then there
exist P, Q G II such that a,b G P and b, c G Q . But now b£Pt~)Q, so PnQ^V), and
therefore by condition (3) for a partition, P = Q. Therefore o, c G P so a ~ c, and so
the relation is transitive. Being reflexive, symmetric and transitive ~ is an equivalence
relation.
Conversely, any equivalence relation on a set A naturally defines a partition of A .
Definition 4.10. Let ~ be an equivalence relation on a set A. For a G A define
[a] := {b G A \ a ~ b} , the equivalence class of a.
14
That the equivalence classes form a partition of the set A is a theorem that is to
be proved in a later Prelim lecture course. (It is not particularly hard, so you may like
to anticipate and find a proof for yourself.) Thus equivalence relations and partitions
correspond to each other in a natural way.
4.2 Functions
The concept of a function from a set A to a set B is simple enough: it is a rule assigning
exactly one element of B to each element of A . We formalise the concept by formulating
it in set-theoretic language as a special kind of binary relation:
Definition 4.11. A function from a set A to a set B is a binary relation / between
A and B such that for each a G A there is exactly one b G B such that (a, b) G / .
We write /(a) = b or sometimes f : a b. We write / : ^4 — >■ f? to mean that / is a
function from A to B .
If / : A — >■ B , we refer to A as its domain and i? as its codomain.
This definition makes clear that if / : A — > B and g : A — > B then / = g if and only
if f(a) = g(a) for every a £ A.
Warning: always make sure that your recipe for defining a function makes sense. For
example, if we are seeking to define a function / : R — > R, then the recipe /(x) := 1/x
fails since /(0) is undefined; similarly, the recipe /(x) := y where y 2 = x fails since it
does not return a unique value — is /(4) equal to 2 or —2? In such cases, where either
/(x) cannot always be defined, or where f(x) appears to take more than one value, there
is something wrong with the definition: we say that / is ill-defined. Our interest is in
well-defined functions.
Definition 4.12. For / : A — >■ B the set of values {/(a) £ B \ a £ A} is known as
the range or the image of / . We emphasize that it is a subset of B .
More generally, for / : A — > B and X C A we define the image of X (under /) by
f(X) := {f(x) G B | x G X} . Thus the range of / is f(A) .
Warning: the terminology of codomains and ranges is not always used consistently.
Make sure that you work out from the context how a writer intends these words.
Definition 4.13. For / : A — >■ B and Y C B we define the preimage of Y (under
/) by f~\Y) := {x G A | /(x) G Y}
Warning: there are serious possibilities of notational confusion here, li X Q A and
x G A then /(X) and /(x) look similar, even though they are different kinds of object:
the former is a set (a subset of B), the latter a single value (a member of B). It is even
worse with the preimage f~ 1 (Y): it is an important piece of notation for an important
concept even when f~ l has no meaning on its own — as often happens.
Definition 4.14. If / : A — > B and X C A the restriction f\X of / to X is the
function X — > B such that (f\X)(x) = f(x) for all x G X .
Thus the restriction of / to X is little different from / ; only its domain is a subset
of that of /. Restrictions are written in various ways. You might see / \X written as
f\x, f\x, f\X,ior example.
15
Exercise 4.3. Let A := {0, 1, 2} and B := {0, 1, 2, 3, 4}. Let / : A -> B be
defined by / : x ^ x + 1 and let g : B — »■ A be denned by <?(x) := if a; is even,
g(x) := 1 if x is odd.
(i) How many different sets f{X) are there for X C A?
(ii) How many different sets g(Y) are there for Y C B?
(iii) How many different sets f^ 1 (Y) are there for Y C B?
(iv) How many different sets g~ 1 (X) are there for X <Z A7
Exercise 4.4. Define / : R -> M by /(x) := sinx for all real numbers x.
(i) What are /([0,tt]), /([0,2tt]), /([0,3tt])?
(ii) What arc /^({O}), /^({l}), r 1 ({2})?
(iii) Let A := [0, tt] and B := [2tt, 3tt] . Show that f(A flB^ f (A) n /(B) .
(iv) Let A:= [0,tt]. Find /(A) , ^(/(A)), and /(/^(A)).
Exercise 4.5. Let X, Y be sets and / : X — )• Y , let A,BCX and C, D CY.
Prove that:
(i) f(A)Uf(B) = f(AUB);
(ii) /- 1 (C)U/" 1 (i)) = /- 1 (CUD);
(iii) /- 1 (C)n/- 1 ( J D) = /- 1 (Cn£>).
Exercise 4.6. Let / : A^ B, where A, i? are sets.
Show that if X C A then X C /"^/(X)). Must equality hold?
Is it always true that if FCB then Y C f(f~ 1 (Y)) ?
Definition 4.15. A function / : A ->■ i? is said to be
(1) one-to-one, or injective, or an injection if whenever ao,ai G A and ao 7^ ai also
f(ao) / /(01); equivalents, I/" 1 ({6})| = 1 for every b G /(A);
(2) onto, or surjective, or a surjection if for every b £ B there exists a£i such that
/(a) = 6; equivalently, /(A) = B ; or equivalently, / -1 ({£>}) 7^ for every b £ B .
(3) one-to-one and onto, or bijective, or a bijection, if and only if it is both injective
and surjective.
Examples 4.16. Our examples will be functions from R to R.
(1) The function / : x 1— > x 2 is not injective because /(l) = /(— 1) while 1^—1. It
is not surjective either because there is no real number x for which f(x) = —1 (so
— 1 is not in the range of /).
(2) The function g : x H> e x is one-to-one. It is not surjective, however, because again
—1 is not in the range of g.
(3) The function h : x 1— > x 3 — x is onto (can you see why?). However it is not
one-to-one, because, for example, h(0) = h(l) , whereas of course 0^1.
(4) The function k : x 1— > x 3 is both one-to-one and onto, so it is a bijection.
Exercise 4.7. Let A := {0, 1, 2} and B := {0, 1, 2, 3, 4}.
(i) How many / : A — > B are there?
(ii) How many / : B — > A are there?
(iii) How many injective / : A — > B are there?
(iv) How many injective / : B — »■ A are there?
(v) How many surjective / : A — > B are there?
(vi) How many surjective / : B — > A are there?
16
4.3 Algebra of functions
There is a very important way in which functions can be combined.
Definition 4.17. If / : A ->■ B and g : B ->■ C then the composition of / and g,
written go/, is the function A — > C defined by the equation (g o f)(a) := g(f(a)) for
every a £ A.
Composition is familiar in calculus as 'function of a function'.
Examples 4.18. Consider functions f,g : M — >■ M: if f(x) := x 2 and g(x) := cos(x)
then (g ° f)(x) = cos(x 2 ), while (fog) = (cosx) 2 (more usually written cos 2 x); if
f(x) := x 6 and g(x) := e x then (5 o /)(x) = e x , while (/ o g)(x) = (e x ) 6 = e 6x .
Notice that if / : A — > B and g : B — >■ C then both g o f and fog are defined only
if C = A. These examples show that it can very well happen that then g o f ^ fog.
Indeed, generally speaking, it is very rare that equality holds. That is to say, composition
of functions is not, in general, commutative.
Theorem 4.19. Let A, B, C be sets, f : A-t B, g : B ->■ C .
(1) If f, g are injective then g o f is infective.
(2) If /, g are surjective then g o f is surjective.
(3) // /, g are bijective then g o f is bijective.
Proof. Clearly, (3) is the conjunction of (1) and (2), so it is only these that need to
be demonstrated. We show (1) and leave (2) as an exercise.
Suppose that both / and g are injective. Let ao, a\ G A and suppose that
(g o f)(ao) = (g o /)(ai). This means that g(f(ao)) = g(f(ai)). Since g is injective
it must be the case that /(ao) = f{a\). And now, since / is injective, also a® = a\.
Therefore g o f is injective.
Exercise 4.8. Write out a proof that if A, B, C are sets and both f : A -t B ,
g : B — > C are surjective then g o f is surjective.
Exercise 4.9. Let / : A ->■ B and g : B ->• C , where A, B, C are sets. Prove
that if g o f is injective then / is injective. Prove also that if g o / is surjective
then g is surjective.
Definition 4.20. The identity function on a set A is the function A — > A defined
by a 1— > a for all a £ A. It is denoted 1a (or just 1 when no ambiguity threatens) or id^-
Observation 4.21. If A, B are sets and / : A -> B then l B of = f and fol A = f.
In particular, for any set A and any function f : A — > A, U°/ = /°1a = /-
Although the operation of composition of functions is not usually commutative it is
what is called associative. Indeed, this is one of the reasons why the associative law
(which you will come across many times very soon) is so very important in mathematics.
17
Theorem 4.22 [Composition of functions is associative]. Let f : A — >■ B ,
g : B — > C , and h : C — > D where A, B, C, D are any sets. Then
ho {go f) = (hog) o f .
Proof. For any a & A, let b := f(a) G B, c := g(b) G C, and d := h(c) G D. Then
{9 o f){a) = g(f(a)) = g{b) = c, and so {ho (go f))(a) = h((g o f)(a)) = h(c) = d. Also,
(h o g)(b) = h(g(b)) = h(c) = d, whence ((h o g) o /)(a) = (/i o g)(f(a)) = (hog)(b)=d.
Thus (ho (go f))(a) = ((h o g) o f)(a) for every a G A, that is, ho (g o f) = (ho g) o f ,
as required.
Observation 4.23 Xei A, B be sets, f : A ^ B a function. If g,h : B — > A are
such that gof = hof = l A and f o g = f o h = 1b then g = h.
Proof. For, then g = g o \ B = g o (f o h) = (g o f) o h = \a ° h = h .
Definition 4.24. A function / : A — > B is said to be invertible if there exists a
function g : B — >■ A such that g ° f = 1a an d / o g = 1# .
By Observation 4.23, g is then unique. It is called the inverse of / and we write
g = Note that, directly from this definition, g is also invertible and g^ 1 = f, that
is, (f- l r l = f.
Theorem 4.25. Let A, B, C be sets, f : A -> B , g : B ->• C . If f, g are invertible
then g o f is invertible and (g o f )^ 1 = f^ 1 o g^ 1 .
Proof. For, using associativity several times, together with the definition of inverses,
we see that
IT 1 og^o (gof) = ((f- 1 o g- 1 ) o g) o f
= (r 1 °(^ 1 °5))°/
= (r 1 °iB)of = r l °f = u,
and similarly (g o f) o (f^ 1 o g^ 1 ) = 1 B . Therefore g o f is invertible and its inverse is
f^ 1 o g^ 1 , as claimed.
The following is an important and useful criterion for invertibility.
Theorem 4.26. A function f : A — > B is invertible if and only if it is bijective.
Proof. Suppose first that / : A — > B is invertible. If f(ao) = f(ai), then
/ _1 (/(«o)) = that is, (r 1 o f)( ao ) = (f" 1 o f)( ai ); so l A (a ) = l A (ai),
which means that do = a\. Therefore / is injective (one-to-one). Also / is surjective
(onto), because if b G B, then /(/ _1 (6)) = (/ o f' 1 )^) = l B (b) = b, so /" 1 (6) is a
member of A whose image under / is b.
Now suppose that / : A — > B is bijective. Define g : B — > A by the rule that
g(b) := a if f(a) = b. We must show that g is well-defined. If b G B then, because / is
surjective (onto), there exists a G A such that f(a) = b, so there do exist candidates for
g(b). Now if f(a) = b and also f(a') = b (where of course a, a' G A) then f(a) = f(a')
and so, since / is injective (one-to-one), a = a' , which means that there is a unique
possibility for g(b) . So g is well-defined. For a G A, if b := f(a), then by definition
18
g(b) = a, that is g{f{a)) = a or (go f)(a) = a: thus go f = 1a- Similarly, if b G B and
a := then by definition of g, it must be the case that f(a) = 6, whence f(g(b)) = b:
thus (/ o g)(b) = b and since this is true for every b G B, f o g = 1 B . Therefore / is
invertible (and g = f^ 1 ), as required.
There are important 'one-sided' analogues of Theorem 4.26.
Definition 4.27. Let A, B be sets. A function / : A — > B is said to be left invertible
if there exists g : B — > A such that g o f = 1a- Then g is called a left inverse of / .
Similarly, / : A — > B is said to be right invertible if there exists h : B — ^ A such that
f oh = 1b, and then h is called a right inverse of / .
Theorem 4.28. Let A, B be sets, and suppose that A^$.
(1) A function f : A — >■ B is left invertible if and only if it is injective.
(2) A function f : A —¥ B is right invertible if and only if it is surjective.
Proof Although it is very similar to the proof of Theorem 4.26, we show why (1) is
true. We leave you to write a proof of (2).
Suppose that / : A — > B is left invertible, and let g : B — > A be a left inverse.
If a ,ai G A and a / a x then 1a(oo) / so (g o /)(a ) / (go that
is g(f(ao)) / g(f(ai)), and so, since 3 is a function, /(ao) / /(ai). Therefore / is
injective.
Now suppose that / is injective. Since A / I we may choose z G A. Define
g : B — > A as follows:
Given b £ B either b G f(A) or 6 ^ /(^)- I n the former case there exists a G A
with /(a) = b and this element a is unique because of the injectivity of /, and so it
is legitimate to define g(b) := a. Thus our prescription yields a well-defined function
g : B — ^ A. And now for any a G A, g(f(a)) = a by definition of g, that is g o f = 1 A .
Hence / is left invertible.
Exercise 4.10. Prove part (2) of the theorem. That is, prove that a function
/:4->S is right invertible if and only if it is surjective. [Note that for this we do
not, in fact, need that A^U.]
19
20
5 Writing mathematics
Mathematics is notorious for having a language of its own. Why? Well, there are many
answers. We deal with concepts such as numbers, sets, relations, functions, that are subtly
different from their counterparts in ordinary discourse. They are different in that their
definitions have been carefully formulated, and the words have acquired precise technical
meanings within mathematics. To be acceptable, the reasoning we employ about these
objects also has to be very precise.
5.1 Errors to avoid
In everyday life we use methods of reasoning that might be wrong; in ordinary life,
uncertain knowledge can be better than no knowledge at all. Here are some examples of
how mathematical language and reasoning can differ from what people are used to.
"Theorem" . All odd numbers greater than 1 are prime.
"Proof." 3 is prime, 5 is prime, 7 is prime, etc.
In mathematics, we never make a claim about all members of some set, unless either
we examine every single one (which is practical only if the set is finite and smallish) , or
we have some method that works equally well for all members of the set. Compare the
above with the following argument that all primes greater than 2 are odd.
If n > 2 and n is even then 2 is a proper divisor of n, so n is not prime.
Therefore if n > 2 and n is prime then n cannot be even, hence n is odd.
The form of generalisation in the absurd "proof" above often works for us in real life (to
make inferences about all wolves, all electrons, and the like from a limited sample), but
it is illegitimate in mathematics.
"Theorem" . = 1.
"Proof". If a = 2, then a 2 = 4. Now let x := 0. Then {Ax - 2) 2 = 4. Therefore
Ax - 2 = 2, so x = 1. That is, = 1.
This argument contains a slightly hidden step looking like this: if P then Q; Q;
therefore P (with P being the statement 'a = 2', and Q the statement 'a 2 =4'). This
rule is of course very doubtful in any situation, but in ordinary life it might allow us to
guess that someone is in a room because we've heard their voice even although it might
be a recording (if Elizabeth is in that room then that is the voice I would hear; that is the
voice I have heard; therefore Elizabeth is in that room), or that allows Sherlock Holmes
to deduce that horses have been past from their hoofprints (they could be a fake). In
mathematics such reasoning is not allowed — we want certainty.
Some ways of reasoning are wrong under any circumstances. 'Begging the question'
means assuming what you set out to prove, as in the following example:
21
"Theorem". 2 + 2 = 5.
"Proof" . If 2 + 2 = 5 , then, cancelling 2 from both sides, 2 = 3. Subtract 5 to
get that — 3 = — 2 and then subtract 2: — 5 = — 2 — 2. Now multiply by —1 to see that
5 = 2 + 2, that is, 2 + 2 = 5, as required. Hm!
Another rule we use in everyday life is what we might call deference to experts; if
someone is an expert in a particular area, then we (often) believe what they say just for
that reason. Use this rule only with great care! Do not automatically believe what you
read in books. And if your lecturers say something strange then they may have made a
mistake; think critically and if you are right then please correct them (politely).
Exercise 5.1. Explain what is wrong with the following (distressingly common)
attempted proof of the AM-GM Inequality for two non-negative real numbers, and
show how to turn it into a correct argument.
We are given that a > , 6^0. Suppose that \fab ^ \ (a + b) . Then
2y/ab ^ a + b, so 4ab ^ (a + b) 2 . Expanding and tidying we see that
^ a 2 — 2ab + b 2 , that is, ^ (a — b) 2 . Since squares are always positive
this is true. Therefore Vab < |(a + b) , which is the AM-GM Inequality.
5.2 The language of mathematical reasoning
Among the most important words in a mathematical argument are the logical words.
They need to be used carefully. Most of them hold no surprises, but some have meanings
that are a little different in mathematics from their common meanings in everyday life.
If, only if. In ordinary discourse the word 'if usually carries an implication that
may have something to do with causation or necessity. For example, when we say 'if
you throw a stone at a window, then the glass will break', then we are not merely
making a prediction, we are implying that the stone will cause the glass to break. Such
implications are absent in mathematics. In mathematics, 'if P then Q } (where P and
Q are assertions) simply means that whenever P holds, Q does too; equivalently, that
either P is false, or Q is true. So the statement 'if P then Q ' is not held to assert any
other connection, causal or otherwise, between P and Q. Thus, for example, both 'If
Paris is the capital of France then the Thames flows through London' and 'If Oxford is
on Mars then I am 100 metres tall' are true statements.
The following all mean the same:
(1) if P then Q;
(2) P implies Q;
(3) P only if Q;
(4) P is a sufficient condition for Q ;
(5) Q is a necessary condition for P ;
(6) if Q does not hold then P does not hold.
(7) whenever P holds, Q also holds.
22
In order to prove a statement of the form 'if P then Q ', one typically starts by assuming
that P holds and one tries to derive Q , or one starts by assuming that Q does not hold
and tries to derive that then also P must not be true. We'll return to this point later.
Notice that 'if P then Q' and 'if not Q then not P' are different ways of saying the
same thing. After all, if P is false whenever Q is false, then when P is true Q must
necessarily be true too. The assertion 'if not Q then not P ' is known as the contrapositive
of 'if P then Q\ We'll return to this point later too.
Note that the contrapositive is very different from the converse. The converse of 'if P
then Q' is 'if Q then P\ The former can very well be true without the latter being true.
For example 'if Cambridge is on the moon then Oxford is in England' is true because
Cambridge is not on the moon, but its converse 'if Oxford is in England then Cambridge
is on the moon' is false since here our assertion P is true whereas our Q is false. Again,
it is true that if 1 = then 1 < 2 , but it is not true that if 1 < 2 then 1 = 0.
The symbol =^ is used to mean 'implies'. We use it only in formulae and only when
an implication is the mathematical statement we wish to make. Thus for example to say
that a relation R on a set A is symmetric (Definition 4.3) is to say that aRb =4> bRa
whenever a,b £ A; the definition of transitivity could be written (aRb and bRc) =4> aRc
for all a, b, c G A .
Warning: never misuse =^ to mean 'then' (as in 'if x = — 1 =4> x 2 = 1'). Never
misuse =4> to mean 'therefore' (as in 1 (y/a — ^ a + b - 2Vab ^ =>• Vab ^
7j(a + b) '). We always read P Q as 'if P then Q 1 or as 1 P implies Q\
If and only if. The statement L P if and only if Q' means 'if P then Q AND if
Q then P\ This can be rephrased 1 P and Q are equivalent'. Usually one proves such
a statement by proving 'if P then Q' and 'if Q then P' separately. The phrase 'P is a
necessary and sufficient condition for Q ' means exactly the same thing.
You'll find that some (many) people use 'iff' as an abbreviation for 'if and only
if. Personally, I dislike it. To my eye it looks like a misprint — and indeed, it is often
misprinted or misread.
Not, and, or. There is little to be said about the so-called connectives 'not' and
'and'. An assertion 'not P' will be true when P is false and false when P is true. An
assertion 'P and Q } will be true when P and Q are both true, false otherwise. In
ordinary discourse the word 'or' in 'one or the other' sometimes carries overtones of 'but
not both'. That is never the case in mathematical usage. We always interpret 'P or Q'
to mean that P holds or Q holds or both do.
Quantifiers. Quantifiers are expressions like for all or for every, which are known
as universal quantifiers; for some or there exist (or there exists), known as existential
quantifiers. Examples of statements with quantifiers:
• every prime number greater than 2 is odd;
• for every natural number n , either n is a perfect square, or ^Jn is irrational;
• there exists a real number x such that x 3 — 103x 2 + 2 = 0;
• some prime numbers have two decimal digits.
Note that a quantifier includes specification of a range: all prime numbers, some real
number (s), or whatever. We have symbols V, 3 for use in formulae (and never be used
as lazy abbreviations for words in text). Thus, for example, if we use P to denote the
subset of N consisting of prime numbers then these statements could be formulated as:
• Vp G P : if p > 2 then p is odd ;
23
• Vn G N : (3m G N : m = n 2 ) or (^/n ^ <Q>) ;
• 3xGR:x 3 -103x 2 + 2 = 0;
• 3pGP: 10^p< 100.
Exercise 5.2. Which of the following statements about natural numbers are
true?
(a) 2 is prime or 2 is odd.
(b) 2 is prime or 2 is even.
(c) If 2 is odd then 2 is prime.
(d) If 2 is even then 2 is prime.
(e) For all n G N, if n is a square number then n is not prime.
(f) For all n € N, n is not prime if and only if n is a square number.
(g) For all even primes p > 2, p 2 = 2012.
The mathematical meanings of quantifiers can be a little different from what they are
in ordinary English. When we say 'for all positive real numbers x, there exists a real
number y such that x = y 2 \ the meaning — that every positive real number has a real
square root — is completely clear. Perhaps slightly less clear is that the assertion 'all even
primes p greater than 3 have exactly nine digits' is true. There are no even primes p
greater than 3, so all of them do have exactly nine digits. The statement is, as we say,
vacuously true. So is 'all members of are infinite', which may look paradoxical as an
English sentence, but happens to be true. In ordinary language, when I say that there
exist people who live in France, I assert that there is at least one person who lives in
France, but I also suggest that there are some people who do not. Such suggestions are
absent in mathematics. Thus, a statement 3x G R : P(x) means precisely that there is
at least one real number that has the property P. It means neither more nor less.
Warning: never get quantifiers in the wrong order. Consider the statement:
for every house H , there exists A such that A is the address of H . That is a ponderous
way of saying that every house has an address. Now consider the statement: there exists
A such that for every house H, A is the address of H. What does this statement
mean? Well, if it is true, then there is an address A — it might be 10 Downing Street
for example — which has the remarkable property that for every house H , the address of
H is 10 Downing Street. The statement means, in fact, that every house has the same
address. Thus if % is the set of all houses and A the set of all addresses then
MH G U : 3A G A : A = address(#) and 3A G A : V# G U : A = address(#)
say very different things. The former is true the latter false. The order of quantifiers
really matters. Great care is needed because English can be ambiguous. For example,
what does the following statement mean?
For all natural numbers x , x < y for some natural number y .
Does it mean
for all x G N , there exists y G N such that x < y ? (★)
or does it mean
there exists y G N such that for all x G N, x < y ? (★*)
Of these (★) is true since y could be x + 1 for example, while (★*) is false because no
matter how big y is there is some x which is bigger. In ordinary language, it is often
unclear what logical order quantifiers are supposed to come in; we rely on context and
common sense to guess intelligently (and our guesswork is so intelligent that we usually
do not notice that there could be a problem). Mathematics, however, is unforgiving.
Sloppiness with quantifiers is inexcusable.
24
Exercise 5.3. Formulate Theorem 2.2, Mathematical Induction, using the
symbols V and =>.
Exercise 5.4. Let / : R -» K. Translate the following formula into English:
Va e M : Ve G M >0 : 36 E R >0 : Vx E K. : \x — a\ < 6 ^> \f(x) - f(a)\ < e .
Exercise 5.5. Let / : R -> K. Express the following using the symbols V
and =>:
for all real numbers a and x and for every positive real number e
there exists a positive real number 5 such that \f(x) — f(a)\ < e
whenever \x — a\ < S .
Does it say much the same as the formula in the preceding exercise?
5.3 Handling negation
Let us return briefly to the word 'not'. Given an assertion P the assertion not P is true
if P is false and it is false if P is true. That is, not P holds if and only if P does not.
The following rules, in which <£4> is used as a symbol for if and only if or is equivalent to,
are basic.
Theorem 5.1 [Some basic rules for negation]. Let P, Q be propositions, that
is, assertions, perhaps about a member x of a set X . Then
(1) not {not P)^ P ;
(2) not (P and Q) <^ {not P) or {not Q) ;
(3) not {P or Q) <^ {not P) and {not Q) ;
(4) not\/x £ X : P{x) <^ 3x G X : not P{x) ;
(5) not 3x e X : P{x) ^ Vx G X : not P{x) .
Where do these come from? Well, (1) should be clear. As for (2), the conjunction
P and Q is false if and only if it is not the case that both P and Q hold, that is to say,
one of notP and notQ must be true, so not P or notQ holds. The justification for (3)
is similar. What (4) is saying is that if it is not the case that P{x) holds for every x £ X
then at least one x £ X fails to satisfy P, and of course vice versa. And what (5) says
is that if there are no members of the set X for which P{x) holds then every member of
X fails to satisfy the condition P , and vice versa.
Examples 5.2. Let / : R ->■ M and let a£l. The two assertions
Ve G R >0 35 G K >0 Vx G R : if \a - x\ < 5 then |/(a) - f{x)\ < e,
3e G R >0 V<5 G R >0 3x G R : \a - x\ < 5 but |/(a) - f{x)\ ^ e
are negations of each other.
25
For, by (4) and (5) of Theorem 5.1, we can move not past a quantifier provided that
we change V to 3 and vice versa. Thus an assertion of the form not V 3 V : P is the same
as V 3 V : not P . In the example P is of the form Q =>■ R and the negation of this (which
must be true if and only if Q =>- R is false) is Q but not R because Q R is equivalent
to notQ or R. (Notice that 'but' is another form of 'and', though in English it carries
overtones of negative expectations.)
Note: It is common to write 'for every e > ' as an abbreviation for 'for every real
number e > 0'. You will often see Ve > standing for \/e G R >0 . Thus the assertions in
Example 5.2 would often be written
Ve > 35 > Vx G R : \a - x\ < S =► \ f(a) - f(x)\ < e,
3e > V5 > 3x G R : |a - x| < 5 but |/(a) - f(x)\ ^ e.
Exercise 5.6. Let / : M — > M. Organise the following assertions about the
function / into pairs such that one member of the pair is true if and only if
the other is false:
(1) for all real numbers x, y, there exists a real number z > x such that
/(*) > v\
(2) for every real x there exists a real number y such that for all real z either
z ^y or f(z) ^x;
(3) there exist real numbers x, y such that for every real number z if z ^ y
then f(z) = x;
(4) there is a real number x such that for every real y there is a real number
z > y for which f(z) = x;
(5) for all real x and y there is a real number z such that z < y and
f(z)?x.
(6) There exist real numbers x and y such that for every real z either z < x
or /(z) y.
5.4 Formulation of mathematical statements
It is important to understand correctly the logical form of a theorem or a problem. The
most common form is 'If P then Q ' though there are a number of variations on the actual
words we use. In this context, P is the hypothesis and Q the conclusion. In the following
examples, the hypothesis is introduced with the symbol <, the conclusion with >.
Examples 5.3.
Theorem A. < Suppose that the polynomial p(x) with real coefficients has odd degree.
> Then p(x) has a real root.
Theorem B. < If n is a natural number > then n has a unique prime factorisation.
Theorem C. < Whenever f is a continuous function on R, and a, b are real numbers
such that a < b, f(a) < and f(b) > 0, > there exists a real number c £ (a,b) such
that /(c) = .
Note that hypothesis and conclusion are not always quite so clearly visible. For
example, Theorem A could have been put in the form
Every polynomial with real coefficients and odd degree has a real root.
26
It is, of course, important to interpret statements of theorems correctly. It is also im-
portant to write down your own theorems clearly, so that someone else can easily work
out what the hypothesis is, and what is the conclusion. For example, the formulation of
Theorem C above is not particularly reader-friendly. Hypothesis and conclusion could be
exhibited more clearly, for example by breaking the one long sentence into two or more:
Let f be a continuous function on R, and let a, b be real numbers such
that a < b, f(a) < and f(b) > 0. Then there exists a real number
c G (a, b) such that f(c) = 0.
Here is a much worse example:
Theorem D. Whenever f : [0, 1] — > R is a continuous function, f is
differentiate and /(0) = f(l), f attains a greatest and a least value,
and there exists c € (0, 1) such that f'(c) = 0.
It seems to start by saying that every continuous function from / : [0, 1] — > R is differ-
entiate and satisfies /(0) = /(l), but that is nonsense. Don't write like this: instead be
clear and orderly.
Exercise 5.7. Reformulate Theorem D to make completely clear what is
meant.
27
28
6 Proofs and refutations
Proofs in mathematics have to stand up to rigorous examination. They need to be
completely logical; they need to be capable of being thoroughly checked. Ideally, though,
they should be more than that. They should help the intuition to understand what lies
behind a theorem, what its context is and what it 'means'. Thus a proof should have a
clear structure. Let's examine some of the possibilities.
6.1 Direct proof
The concept of direct proof is very simple: to prove a statement of the form if P then Q
we start from P and seek to reach Q by legitimate reasoning. Here's an example — one
possible response to Exercise 5.1 above.
Theorem 6.1. Let a, b be non-negative real numbers. Then y/ab ^ \{a + b) .
Proof. Being non-negative, a, b have non-negative real square roots. All real squares
are non-negative, so (\/a — \fb) 2 ^ (moreover, equality holds if and only if a = b).
Expanding, we see that a — 2\/aVb+b ^ 0. Rearranging this we get that Vab ^ \{a+b),
as required. Moreover, we see that equality holds if and only if a = b.
Notice that this is a good (if particularly simple) example of a direct proof. It starts
from the assertion P that a, b are non- negative real numbers and moves forward to the
conclusion Q which is the AM-GM Inequality. Notice also that it gives a little more
information than is in the statement of the theorem.
Exercise 6.1. Use the Binomial Theorem to prove that if n is any natural
number then n < 2™ .
[Although this lends itself to proof by induction, the exercise is to give a simple
direct proof.]
Exercise 6.2. Use the Binomial Theorem to prove that if a is any positive
real number and k is any natural number then there exists N e N such that
n k < (1 + a) 11 for all natural numbers n such that n > N . In symbols the
theorem is Va > Vfc e N 3N e N Vn e N : if n > N then n k < (1 + a)" .
6.2 Proof by contradiction
Since the contrapositive if not Q then not P is equivalent to if P then Q one can prove
the latter by giving a direct proof of the former. This is known as proof by contradiction
or, in older books, as reductio ad absurdum, reduction to an absurdity. Here is a simple
example — another possible response to Exercise 5.1 above.
Theorem 6.2. Let a, b be non-negative real numbers. Then Vab ^ ^(a + b) .
Proof. Suppose, seeking a contradiction, that Vab > |(a + 6) . Then 2\/~ab > a + b
and so 4ab > (a + b) 2 , that is, 4ab > a 2 + 2ab + b 2 . Subtract 4ab from both sides:
> a 2 — 2ab + b 2 = (a — b) 2 . This is a contradiction since squares of real numbers are
non-negative. Therefore the assumption must be wrong, that is, Vab ^ ^(a + b) , as
required.
29
There are two criticisms to be made of this argument. One is that it hides the need for
the assumption that both a and b are non-negative. Where does that enter? The other
that it does not quite so easily tell us the condition for equality. This example illustrates
two matters of style: first, always indicate early and explicitly that you are using the
technique; second, that if you find a proof by contradiction, it is always worth thinking
whether you might turn the ideas round and derive a simple direct proof. After all, a
long proof by contradiction can be a little unconvincing: the final contradiction should
have arisen because the assumption not Q is untenable, but it might have arisen from an
error in the argument. Take a look at the famous and very important paper 'Solvability
of groups of odd order' by Walter Feit and John G. Thompson in Pacific Journal of
Mathematics, 13 (1963), pp. 775-1029. It is a huge proof by contradiction which concludes
on its 253 rd page of exceptionally technical and intricate and clever argument with the
words 'This contradicts Lemma 38.11, and completes the proof of the main theorem of
this paper.' How sure can we be that the contradiction is not the consequence of an error
made somewhere around the 200 th page? As it happens the proof is correct. And we
know of no way other than use of contradiction to prove the Feit-Thompson Theorem.
Even so, for less complex theorems it is a technique to be used with great care. It is,
however, a powerful technique. Here is another example.
Theorem 6.3. Let X be any set. No functions f : X — > pX are surjective.
Proof. Suppose, seeking a contradiction, that there does exist a surjective function
/ : X — > pX . For each x £ X the image f(x) is a subset of X and we can ask whether
or not x lies in it. We focus on those x that do not lie in their image, and define
A := {x £ X | x ^ f(x)}. Then A C X, that is, A £ pX . Since / is surjective there
exists a £ X such that A = f{a) . Now either a £ A or a ^ A. If a G ^4 then a £ /(a) ,
but, by definition of A, a £ /(a). And if a ^ A, so a ^ /(a) , then, by definition of A,
a £ A. This is a contradiction. Therefore (unless there is a slip in the reasoning) the
assumption must be wrong, that is, there is no surjective function X — > pX .
The above theorem and proof are a version of Cantor's argument that for any set,
whether finite or infinite, the cardinal number (that is, the size) of X is strictly smaller
than the cardinal number of its power set pX . This is of great importance in mathe-
matics. It is also of great importance historically — it led directly to Cantor's Paradox,
that on the one hand there cannot be a largest set, but on the other hand the set of all
sets should obviously be a largest set, and it led nearly as directly to Russell's Paradox
focussing on the set of all sets that are not members of themselves. These paradoxes led
in their time to an understanding that Set Theory needed to be formulated considerably
more carefully than I have done in my naive description of it in § 3. That is why Set The-
ory proper is an advanced topic offered in the third and fourth years of the Mathematics
course (though in the second year of Mathematics & Philosophy). The naive version,
however, serves adequately as a good language for mathematics — just so long as we do
not try to push it too far.
Here are two classical examples where proof by contradiction is used.
Exercise 6.3. Let n e N. Show that either n is a perfect square (that is,
there exists m £ N such that n = to 2 ) , or y/n ^ Q .
Exercise 6.4. Let P := {n e N | n is prime}, the set of prime numbers.
Show that P is infinite.
30
6.3 More on Mathematical Induction
Mathematical Induction was introduced in Section 2.2 above, and a typical example of
proof by induction was given there. It is so important as a tool for proving theorems that
it is worth thinking of variants of it. In this subsection we treat two other versions.
Theorem 6.4 [Strong Induction]. Let P be a property of natural numbers, that
is, a statement P(x) about natural numbers x that may be true for some natural numbers
x and false for others. Suppose that for all natural numbers n , if P(m) is true for all
natural numbers m < n then P(n) is true. Then P(n) is true for all natural numbers n.
Although this looks superficially weaker than Theorem 2.2 because its hypothesis is
stronger while its conclusion is the same, the two are in fact equivalent. We can deduce it
from Theorem 2.2 as follows. Suppose that P is as described. Let Q(x) be the property
that for all natural numbers m < x, P{m) is true. Then Q(0) is true, because there are
no natural numbers m < 0, so P(m) is true for every m < 0. If Q(n) is true, then for
all m < n, P(m) holds. So by assumption, P(n) holds also. Thus P(m) holds for all
m < n+ 1, that is, Q(n + 1) holds. Therefore by Mathematical Induction, Q{n) holds
for all natural numbers n. It follows, of course, that for all natural numbers n, Q(n + 1)
holds; that is, for all m < n + 1 , P(m) holds; in particular, P(n) holds. So P(n) is true
for all natural numbers n.
Here is an example of how strong induction may be used.
Theorem 6.5. Every natural number greater than 1 may be expressed as a product
of one or more prime numbers.
Proof. Let P(x) be the assertion that either x < 1 or x may be expressed as a
product of prime numbers. Let n be a natural number and suppose that P(m) holds for
all m < n. If n ^ 1 then P(n) certainly holds. If n ^ 2 then either n is prime or n is
not prime. If n is prime then it is the 'product' of the single prime number n. If n is
not prime then there exist r, s > 1 such that n = rs. Then r, s < n , so by the induction
hypothesis, r and s may each be written as a product of prime numbers. Therefore rs
is a product of prime numbers, that is, n is a product of prime numbers. Now by Strong
Induction, P(n) is true for all natural numbers n, that is, every natural number greater
than 1 may be expressed as a product of one or more prime numbers.
Mathematical Induction may be expressed in a very different way:
Theorem 6.6 [Well-ordering of the natural numbers]. // S is any non-empty
subset of N then S has a least member.
We say that N is well ordered. The principle of Mathematical Induction may be
proved from the fact that N is well ordered in the following way. Let P{x) be a property
of natural numbers such that P(0) is true and whenever P{n) is true then so also is
P(n + 1) . Suppose, seeking a contradiction, that P(n) does not hold for all natural
numbers n. Define S := {x £ N | P(x) is false}. By assumption 5^0 and so S must
have a least member m. Now m > since P(0) is true. Therefore m = k + 1 for some
k e N. Since k < m, k ^ S and therefore P(k) does hold. But then, by assumption,
31
P(k+1) also holds, that is, P(m) holds. This contradicts the fact that meS. Therefore
P(n) must hold for all natural numbers n.
Conversely, we may prove Theorem 6.6 by Strong Induction. Let P(x) be the state-
ment that every subset S of N such that x G S has a least member. Now suppose that
P(m) holds for every natural number m < n. Let S be any subset of N with n £ S .
If there exists m < n such that m G S then, since P(m) holds, S has a least member.
Otherwise n itself is the least member of S . Either way, S has a least member, so P(n)
holds. By Strong Induction P(n) holds for every natural number n. But now, to say
that S is non-empty is to say that there exists x G S . Since we know that P{x) holds,
S must have a least member.
Thus all of induction, strong induction and well-ordering are equivalent. They are
different ways of saying the same thing. Sometimes one is more convenient, sometimes
another. In applications of well-ordering one usually seeks what Reinhold Baer [1902-
1979] used to call 'the least criminal'. That is to say, we are trying to prove some
statement involving natural numbers, and we do it by contradiction: if the statement is
false then there will be a least natural number for which it is false. This he called the least
criminal. The leastness of the criminal gave extra information about it which, suitably
exploited, could give a contradiction. Indeed, this is how the proof by Feit & Thompson
quoted above goes: they suppose their theorem false and they study a least criminal.
That least criminal will be a non-solvable group of least possible odd order (whatever
that may mean). The leastness of its order gave them a huge amount of information
about it, information which led, in the end, to the killer contradiction. Here, to illustrate
the idea, is an alternative proof of Theorem 6.5.
Examples 6.7. An alternative proof that if n ^ 2 then n may be expressed as a
product of prime numbers.
Suppose the statement were false. Then there would be a least natural number n > 1
that is not expressible as a product of prime numbers. It cannot itself be prime, so n = rs
where 1 < r < n and 1 < s < n . Being smaller than n , each of r , s must be expressible
as a product of prime numbers, whence n is such a product after all. This contradiction
proves the theorem.
6.4 Refutation
To refute: to prove a statement to be false or incorrect; to disprove.
Here is an observation. Let f(x) := x 2 + x + 41 . Then
/(-l) = /(0) = 41, f(-2) = /(l) = 43, /(-3) = /(2) = 47,
/(-4) = /(3) = 53, /(- 5 ) = /(4) = 61,
/(20) = /(-21) = 461, /(-31) = /(30) = 973,
As far as the eye can see, f(k) is prime for k G Z. It is not unreasonable therefore to
make the following
Conjecture: if f(x) := x 2 + x + 41 then f(k) takes only prime values, that is,
f(k) is a prime number for every k G Z.
Nonsense! Although, as it happens, f(k) is prime for —40 ^ k ^ 39, it is clear that
/(40) = 41 x 41 , so is composite. One counterexample is enough to refute a conjecture.
32
7 Problem-solving in mathematics
There cannot be rules and recipes for problem-solving in mathematics. If there were
then research would be routine (and perhaps not much fun). To solve problems you
need to work out what they are about, think about what mathematics might be relevant,
experiment, pursue ideas that might work and recognise when they show clear signs of
not working, think laterally, be devious. Very roughly, the problems that one meets in
an undergraduate course are either 'closed' or 'open-ended'. The former are of the form
'prove this', 'calculate that', where you might have a pretty good idea of the goal, but the
problem is to find a good way to get there. An open-ended problem might be a question
of the form 'is it true that ...?' or 'when is it true that ...?' or 'what can you discover
about ...?' which is a bit like a piece of research.
7.1 Some techniques and examples
What techniques do we have? Well, if we are asked to prove an assertion of the form
if P then Q then there are several possibilities. We could try making deductions from
the assumption that P is true and seek to close in on Q . Or (hoping to use proof by
contradiction) we could assume that Q is false, pursue a line of reasoning, and seek to
discover that then P would have to be false. Or we could do both, and hope that our
two lines meet somewhere in the middle. Or, if P and Q depend on a parameter, as they
often will, we could experiment with special values of that parameter and see if they give
us some insight. If they depend on a parameter n 6 N we might try induction. The first
steps in problem-solving are experimentation, pattern-spotting, conjecture. They are
followed by refutation, in which case go back and start again, or verification, writing-up
and celebration. In that order: never celebrate until you have completed writing up your
solution as a fair copy.
Here is an example.
Problem A. Let a\, a,2, . . . , a n and b\, hi-, ■ ■ ■ , b n be real numbers and suppose
that a\ ^ a2 ^ • • • ^ a n . Let a, C2, . . . , c n be the numbers bi, 62, ■ ■ • , b n in some
order. Prove that a\C\ + 0202 + • • • + a n c n is maximised when c\ ^ C2 ^ • • • ^ c n and is
minimised when c\ ^ C2 ^ • • • ^ c n .
It is a closed question. We know what we are aiming for and the problem is to find
an appropriate line of reasoning. Where can we start? We start with rough scribbling,
back-of-an-envelope experimentation. One common starting point is trying small values
of n . What does 'small' mean here? Well, it might mean n = 1 , n = 2, n = 3, possibly
n = 4 . If we have not spotted what matters and what does not matter by the time we
get to n = 5 then we should probably try a different strategy.
Back-of-an-envelope experimentation. In this case, for n = 1 there is nothing to
prove — the problem is non-trivial only for n > 1 . For n = 2 the situation is relatively
simple because either b\ ^ 62 or 62 ^ b\ . Re-numbering if necessary we may suppose
that b± ^ 62- So now the question is, which is the larger of a\b\ + 0262 and 0162 + 02^1
(given that a\ ^ 02 and b\ ^ 62)? What strategy do we have for testing an inequality?
Well, to discover whether or not A < B we could examine A — B and seek to discover
whether it is positive, negative or zero; or we could try A/B and seek to discover whether
it is greater or smaller than 1 . Either technique would replace the two numbers A, B
with just one number. Division, though, has to be used with great care — we need to
33
know beforehand that and indeed, since we are working with inequalities we need
to know whether or not B is positive: if so then division by B preserves inequalities,
whereas if B is negative then division by B reverses inequalities. It is usually (though
not always) safer and easier to use subtraction.
So let's look at the difference D := (a\b\ + 0262) — («i&2 + 02^1) • Aha! It factorises:
D = (ai — 02) (61 — 62) . Since a\ — (12 ^ and b\ — hi ^ 0, in fact D ^ 0. Thus in this
case a\C\ + (I2C2 is maximised when c\ ^ C2 and minimised when C2 ^ c\ , just as the
examiner predicted.
Moving to the case n = 3 we find we are in different territory because we cannot
simply assume that 61 ^ 62 ^ ^3 • There are six different possibilities; or perhaps three
insofar as any one of the b{ could be between the other two. But can we use the case
that we have already dealt with? After all, if c\ > C2 then (as the case n = 2 tells us)
a\C\ + CI2C2 ^ a\C2 + CJ2C1, so interchanging c\ and C2 would increase (or at least, not
decrease) the sum a\C\ + 0202 + 0303. And now we have the key. So now we write out
our fair-copy answer.
Fair copy answer. Consider the ordering c±, C2, . . . , c n of the numbers bi, 62 , • • • ; b n
and suppose that for some r in the range 1 ^ r < n we have c r > c r+ i . Let S := a i c i
and let T be the same sum except that c r , c r+ i are interchanged. All except two of
the summands of T are the same as those of S, so they cancel on subtraction, and
S — T = (a r c r + a r +ic r +i) — (a r c r +i + a r +ic r ) . Thus S — T = (a r — a r+ \)(c r — c r +i) ,
so S — T ^ since a r ^ a r +i and c r > c r +i . Hence interchanging c r and c r +i would
increase ^ OjCj , or at least, not decrease it. Therefore Yl a i c i 1S as large as possible when
c r ^ c r+ i for 1 ^ r < n, that is, ci ^ C2 ^ • • • ^ c n . The same argument shows that
J2 CiCi is as small as possible when c\ ^ C2 ^ • • • ^ c n .
Here is another example.
Problem B. How many solutions are there of the equation x±+X2 + - • -+x m = 2012
in which m and every Xi are positive integers?
It is an open question in that it asks us to find a number. But however can we get
started? Well, as always, we start by jotting on scrap-paper.
Back- of- an- envelope experimentation. Can we spot some solutions? Well, yes:
m = 1, x\ = 2012 is a solution; so is m = 2012, Xi = 1 for 1 ^ i ^ 2012. Don't
laugh! Boundary values, very special cases, can sometimes help us on our way. In this
case, though, perhaps all they do is draw attention to the hugeness of the problem — they
suggest how to find all solutions with m = 2 and m = 2011, perhaps, but give no real
insight into what happens in midstream.
This has told us very little. It has told us two things, however: first that of course
m ^ 2012; secondly that 2012 is a very big number. And after all, though a constant,
2012 could be varied. What about trying the same problem, but with 2012 replaced by
a small number? Sometimes such variants of a problem are known as 'toy versions' of it.
Let's ask for the number of solutions of the equation x\ + X2 + • • • + x m = n for small
values of n.
The case n = 1 is a bit too trivial to give us any insight. We see immediately that
there is just the 1 solution, m = 1, x± = 1. The case n = 2 is hardly less trivial — there
are just the 2 solutions m = 2, x\ = X2 = 1 and m = 1, x\ = 2. The cases n = 3,
34
n = 4 are still small enough that we can enumerate all solutions by hand:
1 + 1 + 1, 1 + 2, 2 + 1, 3,
4 solutions for n = 3 ;
1 + 1 + 1 + 1, 1 + 1 + 2, 1 + 2 + 1, 2 + 1 + 1, 2 + 2, 1 + 3, 3 + 1, 4,
8 solutions for n = 4 .
Is there a pattern here? The numbers 1, 2, 4, 8 look familiar. Would it be fair to
conjecture that the number of solutions of the given equation with 2012 replaced by n
is 2 n_1 ? Looking at the solutions for n = 1, 2, 3, 4 we see that we can get solutions for
the next number up by either adding 1 to the value of the last variable or increasing m
by 1 and giving to the new variable the value 1 .
Perhaps try using this insight to get up to n = 5 .
Does this work in general?
Think a bit.
Yes!
So we have reached the writing-up stage
Fair copy answer. We generalise and prove that for n ^ 1 the number of solutions
to the equation x\ + X2 + • • • + x m = n is 2™ _1 . This is certainly true for n = 1 so we
use induction. Suppose that the statement is true for n . For each equality a\ + 02 +
• • • + « m = n we make the two equalities b\ + 62 + • • • + b m = n + 1 where bi :=
for 1 ^ i < m, b m := a m + 1 , and c\ + c 2 + • • • + c m + c m+ i = n + 1 , where q := a;
for 1 ^ z ^ m, c m+ i := 1. Thus each solution of the equation for n gives rise to two
(obviously different) solutions for n + 1 . Does every solution for n + 1 arise in this way?
Consider the solution di + d.2 + • • • + dk = n + 1 . If d^ = 1 then it arises from the solution
di + cfe + • • • + dk-i = n for n, and from no other; whereas if dk > 1 then it arises from
the solution d\ + c?2 + • • • + dfc-i + (dk — 1) = n for n, and from no other. Thus every
solution for n gives rise to two solutions for n + 1 , and every solution for n + 1 arises
from exactly one solution for n . Therefore the number of solutions for n+1 is exactly
twice as big as the number for n , which, by inductive hypothesis is 2™- 1 , so the number
of solutions for n + 1 is 2 n . Hence by induction, for every positive integer n, the number
of solutions of the given equation is 2 n_1 .
Returning to the problem posed we see that the number of solutions of the equation
xi + X2 + • • • + x m = 2012 in which m and every Xi are positive integers is 2 2011 .
Commentary. Several points about this are worth noting. First, that the general
problem turned out to be tractable, whereas the given problem, a special case of it, looked
out of reach. This happens quite frequently. Here we could use induction to handle the
problem with 2012 replaced by n, whereas for the problem as posed there were difficulties
with the horrible hugeness of 2012.
Secondly, the question is about what are called compositions of an integer. A compos-
ition of n is a sequence (a±, 02, . • • , a m ) of positive integers adding to n. The numbers aj
are known as the 'parts' of the composition. An unordered sum a\ + 02 + • • • + a m = n
(where again the cij are positive integers) is known as a partition of n. The number of
partitions of n, known as the partition function of n, is a far more tricky matter.
Third, there is another way to see why the answer to Problem B is 2 2011 . Write
2012 = (1 + 1 + 1 + • ■ ■ + 1) (recall the White Queen's questioning of Alice in Through
35
the Looking Glass). In this expression the number 1 occurs 2012 times, the symbol +
occurs 2011 times. Take any subset of the set of occurrences of +, and for each + in the
chosen subset insert a closing bracket before it and an opening bracket after it: thus, for
example, 2012 = (1 + 1) + (1 + ■ ■ ■) + (1) . This gives us a composition of 2012 ; conversely,
given a composition we replace each part a, L by (1 + 1 + • • • + 1) of appropriate length,
and get a subset of the original + signs. Thus the number of compositions of 2012 is
the same as the number of subsets of a set of size 2011, hence it is 2 2011 .
Exercise 7.1. How many solutions of the equation xi+x 2 + - ■ -+xioo = 2012
are there, in which x\, x 2 , . . . , xi o are positive integers?
Exercise 7.2. Let's write n — l d m d m _i ■ ■ ■ d 2 d 1 d ' where ^ d t ^ 9 for
^ i ^ m and d. m ^ 0, to mean that n is an (m + 1) -digit natural number
and the given string of digits is its decimal representation. Prove that n and
do + di + ■ ■ ■ + d m -i + d m leave the same remainder when divided by 9 .
Exercise 7.3. Does there exist a positive integer N which is a power of 2,
and a different positive integer M obtained from N by permuting its digits
(in the usual base 10 representation), such that M is also a power of 2? Note
that we do not allow the base 10 representation of a positive integer to begin
with 0. [UK Maths Trust, Mathematical Olympiad for Girls, 20 September
2012.]
Exercise 7.4. (a) Is it true that every natural number may be expressed as
a sum of three square numbers?
(b) Does there exist N e N such that for all n > N, n may be expressed as
a sum of three square numbers?
7.2 Problem-solving: a summary
I wrote above that there cannot be rules and recipes for problem-solving in mathematics. I
also offered some suggestions: experimentation, pattern-spotting, conjecture, followed by
refutation and return to square 1 , or verification, writing-up and celebration. You are on
your own. Learning the definitions and theorems, learning to appreciate the theories that
mathematicians have developed over hundreds of years, learning how to apply them —
these are the most important ingredients. Add native wit, low cunning and initiative.
Enhance the mixture with experience — that is to say, much practice. Persevere. If at
first you don't succeed, pause, do something else, have a coffee (remember the Hungarian
view that a mathematician is a machine for turning coffee into theorems), come back to
the problem and try again. Remember Samuel Beckett's words:
n
Si(n) = \n{n + 1) , and ^(n) = \n(n + l)(n + 2) . What can be said about
S3 (n) ? Prove that in general S r (n) may be expressed as a polynomial of degree
r + 1 in n. What is its leading coefficient?
No matter. Try again. Fail again. Fail better.
That's the way to learn. That's the way to success.
36